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ABSTRACT 



The Kepler Mission was launched on March 6, 2009 to perform a photomet- 
ric survey of more than 100,000 dwarf stars to search for terrestrial-size planets 
with the transit technique. The reliability of the resulting planetary candidate list 
relies on the ability to identify and remove false positives. Major sources of astro- 
physical false positives are planetary transits and stellar eclipses on background 
stars. We describe several new techniques for the identification of background 
transit sources that are separated from their target stars, indicating an astro- 
physical false positive. These techniques use only Kepler photometric data. We 
describe the concepts and construction of these techniques in detail as well as 
their performance and relative merits. 

Subject headings: Extrasolar Planets, Data Analysis and Techniques, Kepler Tele- 
scope 



1. Introduction 



The Kepler mission is designed to determine the frequency of Earth-size planets in 
and near the habitab l e zone of solar-like stars via the detection of photometric transits 
( jBorucki. et al.l l2010at iKoch. et al.l l2010al ). Kepler surveys more than 100,000 late- type 
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dwarf stars in the solar neighborhood with visual magnitudes between 8 and 16 for > 4 
years looking for transits of planets around those stars. There are several astrophysical 
phenomena that can cause a false-positive detection that mimics a planetary transit on a 
target star. Approximately 40% of the transit-like signals detected by Kepler that have been 
deemed Kepler Objects of Interest (KOIs) have been determined to be due to false positives. 

To increase the reliability of the determination which KOIs are planetary candidates, 
it is important to identify as many of these false-positives as possible. Many KOIs have 
transit signals that are too small for conventional ground-based followup, so false-positive 
identification has to be based on Kepler data alone. This paper describes several distinct 
but complimentary methods for using Kepler data to detect cases where the source of a 
transit-like event is offset from the target star's position on the sky. These background false 
positives make up a substantial fraction of all false positives, with most of the other false 
positives being due to grazing eclipsing stellar companions associated with the target star. 
At low Galactic latitudes, background false positives account for almost 40% of all Kepler 
transit-like signals, with the fraction dropping to about 10% at high Galactic latitudes (see 
Figure [Q. Background false positives are detected in Kepler data by observing that the 
pixels that change during the transit are distinct from the target star's pixels. Such cases 
are referred to as active pixel offsets (APOs). The methods described in this paper cannot 
detect all background transit sources: for example when the transit source is extremely close 
to the target star on the sky, but they can identify a large percentage of background false 
positives. 

The techniques described in this paper rely on pixel data returned from the Kepler 
spacecraft. Without this pixel data the identification of background transit sources is much 
more difficult. Indeed, for dim target stars or for small planets with low SNR transits, 
ground-based followup typically will not suffice to identify background false positives. In 
such cases, background false positive identification would be impossible using stellar light 
curves alone. Without the pixels, the star hosting the transit signal cannot be determined. 
Without knowing the star hosting the transit, the object causing the transit cannot be 
characterized. Therefore the availability of the pixel data used to create the stellar light 
curves is a critical component of the success of any transit survey. This insight should drive 
the design of future transit survey missions. 

In the rest of this section we discuss background false positives in general, their identi- 
fication via pixel analysis and how that identification is used in the vetting of Kepler planet 
candidates. The bulk of this paper describes several techniques for performing pixel-level 
analysis to identify background false positives. In £J2] we describe the photometric centroid 
technique, and in §3] the use of difference images to localize the transit signal source. Pixel 
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Fig. 1. — The distribution of the fraction of transit signal sources that are offset from the 
target star, indicating a background false positive. For low Galactic latitude almost 40% of 
all Kepler KOIs are background false positives, while for mid t o high Galactic l atitud es the 
fraction drops to about 10%. This figure is based on data from iBatalha. et al.l (120121 ) . 
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correlations are described in §H We briefly address the special case of saturated targets 
§6] presents several perspectives on how well these techniques perform, with special 



m 



emphasis on comparing the photometric centroid and difference image techniques. 

Thr oughout this paper w e use several examples identified as Kepler objec ts of interest 
(KOIs) faorucki. et al.lboilal lbl lBatalha. et aDboiOal lioiil : burke, et al. Il2013h . Some KOIs 
are now valid candidates, while others have been determined to be false positives. We give 
particular attention to two examples to illustrate our techniques: KOI-221, which is a Kepler 
target where the transit source location is observationally coincident with the target, and 
KOI-109, which is a Kepler target for which the transit source is clearly offset from the target 
star. The list of KOIs and their properties can be found at the NASA Exoplanet Archive^ 
while the light curves and pixel data for all Kepler targets can be found at the Mikulski 
Archive for Space Telescopes^. 



1.1. Background False Positives 

There are sever al astrophysi cal phenomena that can mimic a planetary transit on a 



specified target star. iBrownl (120031 ) distinguishes 12 combinations of giant planets and stars in 
eclipsing and transiting systems that can produce light curves mimicking a planet transiting 
a solitary primary star. Six of the combinations do not involve planets at all, and four others 
distort the transit light curve so that the size of the planet is indeterminate. 

In this paper we are concerned with those phenomena which are due to astrophysical 
sources that are not associated with the target star. These primarily include eclipsing binaries 
or large planet transits on stars that have flux in the pixels used to create the target star's 
light curve. Because of dilution from the target star, even deep background eclipsing binaries 
often cannot be identified from the target star's light curve alone. Analysis at the pixel level 
is required to identify the location of the transit signal source. We are particularly interested 
in cases where the transit signal's source is sufficiently separated from the target star that we 
can measure a statistically significant offset between the target star and the transit source. 

Additional sources of false positives that can be detected by the methods described in 
this paper include 

• Very wide multiple star systems, where the transit source is gravitationally bound 



1 http://exoplanetarchive.ipac.caltech.edu 
2 http: / /archive. stsci.edu/kepler 
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to the target star. When the separation between the target star and the companion 
hosting the transit signal source is large enough the methods described in this paper 
can detect the offset. 



Optical ghosts and electronic crosstalk (ICaldwell. et al.l 120101 ) from planetary transits 



or eclipsing binaries elsewhere on the Kepler focal plane. When the image of the ghost 
or crosstalk falls on the target star's pixels but is sufficiently separated from the target 
star these sources can be detected by the methods described in this paper. In addition, 
optical ghosts can have very non-stellar morphologies. Transit signals due to optical 
ghosts will exhibit these morphologies in several of the techniques described in this 
paper. 



Our basic strategy is to measure the location of the transit source on the sky, compare 
that to the location of the target star, and declare the transit signal a false positive if the 
transit source location is significantly offset (more then three standard deviations, written 
> 3a) from the target star location based on reliable data. All the methods of computing 
these offsets described in this paper use x 2 minimizing (least-squares) methods. Assuming 
Gaussian statistics, these offsets form a two-degree-of-freedom \ 2 distribution, that have 
offsets > 3a due to random fluctuations about 1.11% of the time. As we will show in this 
paper, offset uncertainties follow an approximately Gaussian distribution in a statistical 
sense, through the uncertainty around individual targets may not be Gaussian. 



1.2. Pixel Analysis to Identify the Location of the Transit Source 

As mentioned in Section [TTT1 the background binary causing a transit signal can be very 
faint, indeed significantly fainter than the general background or the wings of the target 
star, and still mimic a planetary transit. Consider the case of an aperture that contains 
only a target star with constant flux F and a background binary with other negligible 
sky background. If the background binary is Am magnitudes fainter than the target star, 
then the flux ratio of the background star to the target star is AF = (100)~ Am ^ 5 . If the 
background binary has a fractional eclipse depth c?b ac k, then the total flux out of transit is 
F out = p + FAF. In transit the total flux is F' m = F + (1 - d hack )FAF. Therefore the 
fractional observed depth in the aperture is 

, , F ia _ 1 + (1 - 4ack)AF _ 4ackAF 

Cl n Ko 1 1 ; ; . 

Font 1 + AF 1 + AF 

In the case of a 14th magnitude target star and a 22nd magnitude background eclipsing 
binary with <iback = 0.5 we get d ohs = 315 ppm. A transit of this depth is easily detected 
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in Kepler data and would mimic the transit of a small planet, though the 22nd magnitude 
background star would not be readily apparent in the Kepler data. 

There are several ways to use Kepler pixel data to measure the distance from the target 
star to the transit source. We focus on three classes of techniques, each of which have their 
strengths and weaknesses. As we describe in detail below, none of these techniques work well 
in all circumstances due to systematic error sources that vary from technique to technique, 
but we find that the combination of these techniques covers the majority of cases where there 
is sufficiently large signal to noise ration (SNR) to measure the transit source location. Our 
focus is on techniques that can be reliably automated due to the large number of objects in 
the Kepler data. We would also, when possible, like to associate the transit source with a 
known star. Therefore we describe techniques that provide an estimate of the transit source 
location on the sky rather than simply determining if the transit source is at the target star 
location. 



Kepler collects pixels specific to each target (IBryson. et al. 1 120101 ) . A subset of these 
pixels, called the photometric optimal aperture, is summed to create the light curve for the 
target (see Figure [2]). The pixel analysis in this paper uses either the optimal aperture plus 
one halo of pixels, defined as any pixel adjacent to the optimal aperture (the photometric 
centroid technique described in £J2]), or all pixels collected for a target (the difference image 
technique described in $3]). For most targets, Kepler pixel data is collected once every long 
cadence (29.4 minutes), and for a subset of targets data is collected once every short cadence 
(0.98 minutes). In this paper we limit our discussion to long cadence observations. 

All of the methods described in this paper identify spatially separated false positives 
by comparing pixel values during in-transit cadences to values of the same pixels during 
out-of-transit cadences. 

Analysis of Kepler pixels to identify the location of the transit relative to the target star 
has to solve three problems: 

• Analyzing the Pixels Within a Cadence There are various ways that the transit 
source location can be inferred from pixel data. Some of these methods require the 
identification of cadences that occur during transit and cadences that do not. 

• Combining the Cadences Within a Quarter T he Kepler spacecraf t rotates 90 



degrees about the photometer boresite every ~ 93 days (IKoch. et al.ll2010al ). Each ~ 93 
day period is referred to as a Quarter. While the Kepler focal plane is approximately 
symmetric under these 90 degree rolls, a star falls on different CCD chips at difference 
pixel coordinates in different quarters. How in-transit and out-of-transit cadences 
within a quarter are selected and combined varies from technique to technique. 
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Fig. 2. — Pixels collected for a Kepler target. All collected pixels are outlined by the solid 
white line. The photometric optimal aperture is outlined with a white dot-dashed line. The 
pixel values are shown by the pixel color. Asterisks give the location of known stars in the 
field, including those just outside the collected pixels. For each star the Kepler Input Catalog 
number and Kepler magnitude are given. 
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• Combining the Cadences Across Quarters Some of the techniques we discuss 
operate within a single quarter and will deliver different results from quarter to quarter. 
These results for each quarter must be combined to provide an overall measurement. 

There are three classes of methods that we use to solve these problems: 

• Photometric Centroid Shift Detection of a shift in the photometric centroid of the 
flux in the pixels (see $2J) that is correlated with the transit signal. This centroid shift 
can be used to estimate the location of the transit source as described in £TSJ 

• Difference Imaging By constructing the difference of the in- and out-of-transit pixel 
images, a direct image of the transit source can be constructed as described in §0 
The centroid of this image provides a direct measurement of the location of the transit 
source. This method assumes that the only source of flux variation is the object 
creating the transit signal. 

• Pixel Correlation Images When the transit signal can be detected in individual 
pixels via correlation with the photometric transit signal, an image can be constructed 
where the value of each pixel is given the correlation value as described in §|U This is 
an alternative method of creating a direct image of the transit source, whose centroid 
provides the transit source location. 

These three methods are in principle very similar, but have different responses to sys- 
tematics and noise, transit SNR, and field crowding. The use of all three methods provides 
increased sensitivity and confidence in the identification of background false positives, par- 
ticularly when the transit SNR is low. 



1.3. The Role of Offset Analysis in Planet Candidate Vetting 



The techniques described in this paper are used to decide whether or not a detected 
transit signal belongs on the Kepler planetary candidate list. The details of how these 
analyses are used has evol ved over time, and are descr ib ed in papers detailing the release of 



plane tary candidate lists ( IBorucki. et al.ll2010al . 12011 al ibi; iBatalha. et al.ll2012l ; iBurke. et al. 



20131 ). The general pattern is to identify those targets that show a significant offset between 
the target star and the transit source relying primarily on the difference imaging method. 
Those targets that have a borderline significant source offset or have other cause for concern 
are examined using all the methods described in this paper, including manual examination 
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of the pixels. Targets that have a confirmed offset from the transit source are identified as 
false positives. This disposition has changed over time for a small number of targets, as 
the techniques described in this paper have become more refined and as more data becomes 
available, resulting in greater measurement precision. 



2. Source Location from Photometric Centroid Shifts 

2.1. Computing Pixel Centroids 

The most traditional method for estimating the position of a light source is that of 
photometric centroids, also known as flux-weighted centroids. Photometric centroids measure 
the "center of light" of all flux in the pixels. While photometric centroids do not exactly 
measure the location of any particular star, it will be shown below that under idealized 
circumstances they can be used to compute the location of a transit source. 

The row and column photometric centroids of the pixels for each target are computed 
for each cadence as 

Y N r b V N c b 

'-'row N -i "^column N \ l J 

where bj is the flux in pixel j at row and column (r^, Cj). If we denote the covariance matrix 
of the pixel values bj as Cy (so the uncertainties in the pixel values are the square root of 
the diagonals: Oj = ^/Cj]), then the standard propagation of errors gives the uncertainty in 
the photometric row centroid as 



N N 



E ^ £ "' r f' + V V Cv (2) 



with a similar formula for the uncertainty in the column centroid. We see that the sensitivity 
of the centroid value o"c row is proportional to the square root of the elements of the covariance 
matrix dj, in particular to the uncertainty in the pixel values <7j, divided by the total flux in 
the pixels ^2f =1 bj. Therefore, photometric centroids are very sensitive to variations in pixel 
value, in particular to shot noise and stellar variability. 

For photometric centroids computed in the Kepler pipeline, j ranges over the optimal 
aperture plus a single ring of pixels (sometimes called a halo). The result is a time series 
containing the row and column centroids, called centroid time series. The centroid shift is 
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defined as the centroid value for cadences out of transit, C out , subtracted from the centroid 
value for cadences in transit C m : AC = C m — C out . We assume shifts in different cadences 
are uncorrelated, so these shifts have an uncertainty given by a\ c = <r^ in + cr^ out . 

It is very important to distinguish between the centroid shift, which measures how far 
the centroid moves between in- and out-of-transit cadences, and the source offset, which 
measures the separation of the target star from the transit source. As we will describe 
below, the centroid shift and source offset are related, but measure very different things. 
The centroid shift measures the change in the photometric centroid due to all changes in 
flux in the aperture. The source offset is derived from the centroid shift, but measures 
the separation between the target star and the transit source (which may or may not be a 
different star). In particular, because there is always background flux and field stars, the 
centroid shift AC will always be non-zero even when the transit signal is on the target star. 
In such cases the centroid shift can be relatively large while the source offset may be very 
close to zero. 

Low-frequency secular trends due to small, slow changes such as differential velocity 
aberration, small pointi ng drifts and thermally i nduced focal length changes are common 



in centroid time series (IChristiansen. et al.ll2012l ). These trends are removed prior to the 



analysis described in this section, for example by local median filtering using a window of 
48 cadences. 

To facilitate combining the centroids across quarters, the centroid time series is converted 
to celestial right ascension (RA) and declination (Dec) using the Kepler focal plane geometry 
model in combination with motion polynomials that c apture local variations in the focal 
plane geometry model ( Tenenbaum and Jenkins! boiol ). In these coordinates the centroid 



shift AC is expressed as seconds of arc. 

When the centroid shift AC is large enough, it can be taken to indicate that the transit 
source is not on the target star. Using AC directly to make this determination must be 
done with great care, however. AC will be smallest when the target star is the source of 
the transit, the target star is isolated, residual background flux is small after background 
correction, and the target star is near the geometric center of the centroided pixels. This 
is rarely the case, however, so even when the target star is the source of the transit there 
will be a non-trivial centroid shift. A larger centroid shift that is correlated with the time 
of transit is an indicator that the transit source may not be the target star. Determining 
whether a centroid shift indicates that the transit source is not the target star is difficult, 
however, and depends on the details of other flux sources in the target's pixel aperture. In 
§2.31 we describe how to use the centroid shift to estimate the location of the source of the 
centroid signal, which is a more robust method for determining whether the transit source 
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is the target star than using the centroid shift alone. 

A graphical method showing the correlation between the centroid shift and the transit 
signal is to plot the median-detrended centroid time series against the normalized, median- 
detrended light curve flux value. The results is a cloud plot, shown in Figure |3j Most points 
in a cloud plot are out-of-transit cadences and form a cluster around (0,0). The size of 
the cloud reflects the sensitivity of the photometric centroid computation to noise in the 
pixel values. When there is no centroid shift associated with transits, the points in transit 
(with negative normalized flux) fall directly below the out-of-transit points. When there 
is a centroid shift associated with the transit, points in transit will fall to the side of the 
out-of-transit cloud. Seeing sideways motion of the in-transit points as shown in the right 
panel of Figure [3] indicates a centroid shift associated with the transit. This suggests that 
the transit source may be offset from the target star. As explained above, care must be taken 
when interpreting cloud plots because there may be a non-trivial centroid shift correlated 
with the transit even when the target star is the transit source. 



2.2. Correlating Centroid Motion with the Transit Model 

The centroid time series is sensitive to photometric noise, so quantitatively measuring 
the correlation of the centroid shift with the photometric transit signal can be difficult, par- 
ticularly for low SNR transits. A simple approach is to identify all in- and out-of-transit 
cadences, and compute the average (or median) in- and out-of-transit centroid values. The 
average centroid shift is then given by the difference of the in- and out-of-transit average 
centroid locations. This method encounters many difficulties, however: quarter-to-quarter 
differences in aperture shape will introduce systematic errors, and non-transit related vari- 
ability will degrade these averages as measures of transit -related shifts. A better method 



is to fit a transit model computed during data validation (IWu. et al. 1120101 ) to the centroid 



time series. This will provide a more robust measurement of AC. 

In this section we define the centroid shift time series AC n = C n — C out where C out 
is the average out-of-transit centroid and n labels the cadence. In this section We assume 
that the transit model has been whitened to remove secular variations such as those due to 



pointing drift and stellar variability (IWu. et al. 1120101 ). in which case the centroid shift time 



series AC n must be whitened in the same way. We compute a least-squares fit of the centroid 
shift time series AC n to the transit model M n multiplied by a constant 7, weighted by the 
centroid uncertainties. This fit is most easily done by requiring that the transit model and 
the centroid shift time series both have zero mean when the transit is not occurring. This 
implies that the transit model M n = for out-of-transit cadences. When this is the case we 
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Fig. 3. — Example cloud plots where the normalized residual flux (y-axis) is plotted against 
the centroid shift (x-axis). Each point plots the normalized, median-detrended flux value 
against the median-detrended RA (blue crosses) or Dec (red circles) centroid time series in 
a single long cadence. In both figures most points are from out-of-transit cadences and form 
a cloud around (0,0). Left: When the transit is on an isolated target star (KOI-221 in this 
example), the centroid does not shift when in transit, so in-transit points are directly below 
the out-of-transit points. Right: When the transit is on an object offset from the target 
(KOI- 109 in this example), the in-transit centroids are shifted relative to the out-of-transit 
centroids and appear below to one side, indicating a strong possibility of a background false 
positive. In this example the Dec centroid components show a shift while the RA components 
to not, indicating that the transit source is offset in the Dec direction. 
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minimize 

N 



x 2 = E 7 ^ ( AC « " ^ M «) 2 • ( 3 ) 



This least-squares minimization problem has the solution 

N AC„M„ 



EiV 
n=l 



Examples of this fit are given in Figures H] and [H 

Assuming that the centroid and transit model uncertainties are uncorrelated over time, 
and neglecting uncertainties in the transit model values, the uncertainty in 7 is 



V Ml 



(5) 



Only in-transit cadences contribute to the computation of 7 and 07 because M n = for 
out-of-transit cadences. Because M n is fit to the whitened and normalized flux light curve, it 
has unit variance, so 7 is in the same units as AC n and directly gives an estimate of the in- 
vs. out-of-transit shift: AC ~ 7. When the centroids shifts are in RA and Dec coordinates, 
all quarters of data can be simultaneously fit. From Equation (j5]) we see a V N m reduction 
in the uncertainty, where N m is the total number of in-transit cadences, so combining many 
quarters increases the precision of the estimate of AC in each coordinate. 

Once the shift is estimated in RA and Dec (in seconds of arc), the shift distance is 
simply 

D = ^AC| A + AC£ ec , (6) 

with uncertainty 

a D = — ^ -. (7) 

A high-level detection statistic indicating whether a detected shift is statistically sig- 
nificant is also computed. This statistic measures the probability that the detected shift is 
due to an actual signal rather than a statistical fluctuation in white noise by subtracting the 
residual \ 2 from the signal x 2 - From this statistic a significance metric is constructed that 
is normalized to the range [0, 1], where 1 means that there is no detecte d shift and mean s 



that the shift is highly significant. This is equivalent to Equation (4) of IWu. et al. I ( 120101 ) . 
which in our notation is given by 
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Fig. 4. — An example of a fit of the centroid time series to the transit model for a case when 
the transit source is at the same location as the target star (KOI-221). Top: the detrended 
flux light curve over all quarters folded on the transit period, with a closeup on the transit. 
Middle and Bottom: the RA and Dec detrended centroid shifts AC for the same cadences 
in milli-arc seconds. There is no apparent change in the centroid positions at the time of the 
transit. 
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Fig. 5. — An example of a fit of the centroid time series to the transit model for a case 
where the transit source is offset from the target star (KOI- 109). Top: the detrended flux 
light curve over all quarters folded on the transit period, with a closeup on the transit. 
Middle and Bottom: the RA and Dec centroid shifts AC for the same cadences in milli-arc 
seconds. There is a readily apparent change in the centroid shifts at the time of the transit, 
particularly in Dec. The transit model that best fits the flux light curve is superimposed on 
each centroid shift plot, scaled by the coefficient 7 in Equation (j3J). The value of AC = 7 
in declination is about 0.1 milli-second of arc. The poor model fit is due to the fact that 
the transit source for KOI- 109 is in fact a deep eclipsing binary while the model assumes a 
planetary transit. 
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2.2.1. The impact of crowding and variability on the centroid shift estimate 

The computation of the in-transit centroid shift assumes that the transiting object is the 
only source of time varying flux that is correlated with the transit signal in the target star's 
pixels. While this is usually a reasonable assumption, it is sometimes violated, introducing 
systematic error into the centroid shift estimate. A dramatic example is KOI-1860, whose 
pixels are shown in Figure [6j In this case there is a field star that is 2.7 magnitudes brighter 
than the target star at the edge of the collected pixels. Examination of the pixel flux time 
series shows that this bright star has moderately high variability on short time scales. In 
addition, because this bright star is at the edge of the collected pixels and is only partially 
captured, there are strong variations in flux due to spacecraft pointing jitter. The effect of 
these variations on the centroid time series are shown in Figure These variations are on 
a time scale that occasionally correlates with the transit signal, leading to a small spurious 
measured centroid shift in the fit (j3J). The reconstructed transit source location using this 
spurious shift measurement, described in §2.3} indicates a transit source separated from the 
target star by about 4 arcseconds. As we will see in §3.4.1| however, the PRF-fit technique 
provides strong evidence that the transit source is only about a third of an arcsecond from 
the target star. 

2.3. Estimating the Transit Source Location from Centroid Motion 

Photometric centroids are the weighted average of all flux in the target star's pixels, so 
they do not provide direct information about the location of the target star or the transit 
source. In particular, as explained in §2.1[ a statistically significant shift does not necessarily 
imply that the transit source is offset from the target star. In Appendix |A] we derive a 
formula approximating the location of the transit source from the observed transit depth 
(based on the light curve created by summing the pixels used for centroiding) , the out-of- 
transit centroid location C and the centroid shift AC. Remarkably, this formula applies 
in the presence an arbitrary background signal, including any number of field stars in or 
near the aperture, and does not depend on the brightness of those stars. This formula only 
assumes that the flux from the transit source is the only time-varying signal in the aperture, 
so no other stars or the background flux vary in brightness. These assumptions are never 
exactly true, but in many cases they are very nearly true and in these cases we can estimate 
the transit source location. We can then compare the transit source location to the catalog 
location of the target star to estimate the offset of the transit source from the target star. 
We assume that the centroids are provided in RA and Dec coordinates, denoted (a, 5). 

We denote the RA and Dec components of the average out-of-transit centroid as (C° ut , C£ ut ), 
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Fig. 6. — The pixels collected for KOI-1860 in quarter 10. The pixels are dominated by 
the field star KIC 4157320 which is 2.7 magnitudes brighter than the target star. KIC 
4157320 has strong variability. In addition, because it is only partially captured in the 
pixels, spacecraft pointing variations are apparent in the pixel flux light curves. 
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Fig. 7. — The (not folded) flux and photometric centroid time series for KOI-1860 in quarter 
10. The vertical red lines indicate times of transit. The bright field star at the edge of the 
aperture (see Figure |6]) causes strong variations in the centroid time series due to the intrinsic 
variability of that star combined with spacecraft pointing jitter, which is exacerbated by that 
star being only partially captured in the pixels. These variations cause a spurious centroid 
shift that is correlated with the transit signal. 
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and the centroid shift measured as described in §2.21 as (AC a , ACs)- If the observed transit 
depth is dobs 5 then as shown in Appendix |A] the centroid of the flux from the transit source 
that falls in the aperture is at RA and Dec 

"transit = C° Ut - (- 1 J j, transit = Cg^ - (3 1 J AC S (9) 

V«obs / COSd V«obs / 

(see Figure [8}. When all flux from the transit source is captured in the aperture, then this 
centroid gives the location of the transit source. 

The formal uncertainty in the source position is given in terms of the centroid uncertainty 
oc a and depth uncertainty o,i by 



, , , , crL ut + cr 2 in AC 2 0rf 

= ^ + ( - - 1 ) ^ + -r s i£ do) 




These uncertainties do not account for systematic error due to other sources of varying flux. 
For d bs <C 1 Equation (J2J) reduces to 

^ x n AC 5 , s 

"transit — O a - 3 r, Otransit — W ~, , {12) 

«obs COS d dobs 

the approximation given in Equation (2) of IWu. et al. I (120101 ) . The uncertainties are sim- 
ilarly approximated by replacing (l/d b s — 1) by l/d b s - This approximation has an error 
that is proportional to d bs, which is very small for most Kepler planetary candidates. 

Once we have the centroid source location from Equation (J9]), we compare it with the 
target location to determine the source offset. The target star location cannot, however, be 
reliably determined from the centroid time series, so we take the target star position from the 
Kepler Input Catalog. This choice potentially introduces new sources of systematic error, 
particularly due to unknown proper motion. 

Given the target star's catalog location (at targe t, ^target), we can compute the target offset 
and uncertainty from the offset components Aa = (^transit ~ "target) cos 5 and A5 = transit ~ 

"target 

D = VAa^+A* 5 , „ b= «Z?5 (I3) 



where a/\ a = \ / o" 2 + a 2 cos 5 and a ax = \ / cr? + cr? 

V transit ^target ii0 Y "transit Otarget 

We can now determine if the transit source is statistically significantly offset from the 
target star by observing whether D > 3a £>. 
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Fig. 8. — An illustration of the relationship between centroids, centroid shifts, the back- 
ground eclipsing binary causing the transit signal, and the target star in Equation (Q for 
an otherwise empty aperture. The photometric centroid when a transit is not occurring is 
given by C out (filled circle). If the transit is due to an eclipse on the background star, during 
the eclipse the centroid will shift towards the target star to C m (open circle). The resulting 
transit shift is AC = C m — C ont . Applying Equation (Q gives an estimate of the transit 
source location (filled square), which in an idealized case will correspond to the location of 
the transit source. 



2.3.1. Systematic errors in the source position estimate 



As discussed in Appendix |Aj the above analysis does not describe the current implemen- 
tation in the Kepler p ipeline. The Kepler pipeline uses the photometrically optimal aperture 



( IBryson. et al. 1120101 ) to compute the transit depth and the optimal aperture plus one ring 



of surrounding pixels to compute the centroid (see Figure Mj. This use of different pixel 
apertures to compute the depth and centroid invalidates the above analysis when significant 
flux from the transit source falls outside the optimal aperture. Because optimal apertures 
are as small as a single pixel, such overshoot is possible when the transit source and target 
star are separated by more than one Kepler pixel (3.98 arcseconds). 



Out of Transit Flux (e-/cadence) x 10 6 
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Fig. 9. — The optimal aperture compared with the pixels used for photometric centroiding. 
The optimal aperture pixels are outlined by the dot-dashed line, while the pixels used for 
photometric centroiding are outlined by the dashed line. 



In the typical background false positive case when the transit source is associated with a 
field star that is significantly dimmer than the target star, the observed depth in the optimal 
aperture (the depth computed by the Kepler pipeline) will be smaller than the depth that 
would have been observed using the centroided pixels. This will result in an overestimate 
of the distance of the transit source from the out-of-transit photometric centroid C out in 
Equation (Q. Occasionally the field star associated with the transit source will be brighter 
than the target star so the flux from the target star dominates the centroids. In this case 
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the observed depth in both apertures will be similar, resulting in less of an overshoot. This 
behavior is observed in §6.11 See Appendix |X] for details. 

The dependence of the source offset estimate on the ratio of the brightness of the 
background star to that of the target star is shown in Figure [101 This example is similar 
to that in Figure El where the background star causing the transit signal is outside the 
optimal aperture and mostly, but not completely, captured in the centroided pixels. When 
the background star is dim, the estimated transit source overshoots the correct offset. When 
the background star is significantly brighter than the target star then the flux from the 
background star dominates the depth estimate, so the depth based on the centroided pixels 
is about the same as the depth based on the optimal apertures. But because the background 
star is close to the edge of the centroided pixels not all flux from the background star is 
captured. Therefore the source offset estimate in Equation gives the centroid of the flux 
in the pixels from the background star, which is closer to the target star than the background 
star itself. 



3. Difference Imaging 

3.1. The Concept of Difference Imaging 

The difference image technique is based on the insight that subtracting the in-transit 
pixel values from the out-of-transit pixel values give an image that shows only those pixels 
that have changed during the transits. Further, if the changes during transits are due to a 
change in brightness of a star (as is the case for a planetary transit or an eclipsing binary) 
then the bright pixels in the difference image will be those of that star with flux given by 
the fractional transit depth times the flux of that star. 

More precisely, consider a set of pixels that contain flux from M stars, labeled by 
the index j, at locations (atj,5j) with flux bj (we neglect background flux in this simple 
analysis). The PSF will distribute the flux from each of these stars over several pixels. We 
express the flux on the pixel at row r and column c due to star j by the unit flux function 
/ (aj, 5j, r, c) (so the sum over all pixels of / (<x,-, 5j,r,c) = 1). Then the out-of-transit pixel 
values due to all stars will be given by F out (r, c) = ^2j = xbjf(aj,5j,r,c). If star k has a 
transit of depth o?back then during mid transit the pixel values would be given by F m (r, c) = 
J2j=i,j^k bjf ( a ji fiji r i c ) + (1 _ <4ack) hf («fc, Ski r i c )- 111 the ideal case where the only flux 
change is in star k, the difference image will be F out (r, c) — F in (r, c) = dback&fc/ (®ki $ki r i c )> 
which is exactly the image of star k with flux dbaeA- 

Difference images provide direct information about the location of the transit source, as 
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Fig. 10. — The photometric-based transit source offset as a function of the ratio of the 
background source brightness to the target star brightness. The example shown here is for 
a 0.1% transit on a background star that is 10 arcseconds from the target star. The optimal 
aperture in this case is 2 x 2 Kepler pixels (7.96 x 7.96 arcseconds), so the background 
star is outside the optimal aperture in the halo pixels. Because significant flux from the 
background star falls outside the captured pixels, the source position estimate (Equation 
([9])) underestimates the actual position of the background star. 
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opposed to the use of photometric centroids in §2.1[ where the source location is inferred. 

Example pixel images are shown in Figures [TT1 and [T2l In Figure [TT] we see an example of 
a star (KOI-221) for which there is no apparent offset between the target star and the transit 
source. In this case the difference image looks much like the in- and out-of-transit images, 
likely because the target star is itself the source of the transit (and there are no other stars 
of comparable brightness in the out-of-transit image). Therefore the only difference between 
the difference image and the out-of-transit image is the flux level in the pixels. Figure IT21 
shows a case (KOI-109) where the difference image is dramatically different from the out- 
of-transit image, and appears as a star image coincident with the dim unclassified star KIC 
4752452. Because KIC 4752452 is unclassified, it does not have a Kepler magnitude. In this 
case the pixel data show that the transit source is clearly not on the target star. 

When the transit SNR is high the pixel images appear as in Figures [TT] (SNR = 378) 
and [12] (SNR = 101), with very well defined star-like difference images. When the SNR is 
high and the transit is on the target star, as in Figure [TT], we expect the difference image 
to look like the out-of-transit image. Figure [TjJ] shows an example of a low SNR transit on 
KOI-2949 with an SNR of 11. In this figure the difference image looks significantly different 
from the out-of-transit image, so a cursory inspection of only this quarter's out-of-transit 
and difference images would indicate a significant offset. But examination of other quarters 
finds offsets in other directions in some quarters and much smaller offsets in other quarters. 
When the SNR is low, the difference image is subject to pixel-level systematics that can 
pollute the difference image. As we will see in §3.4} combining quarters puts the transit 
source statistically close to the target. When the SNR is very low, the difference image is 
dominated by noise because the transit does not have sufficient signal in individual quarters. 

When the offset is as dramatic as that in Figure IT2"| cursory visual inspection is sufficient 
to determine that the transit signal does not occur on the target star. We are interested, 
however, in measuring smaller offsets that may not be so visually obvious. In addition we 
wish to have the ability to automatically measure and detect such transit-source offsets for 
thousands of transit signals. This can be done by measuring the centroid of the difference 
image and comparing with estimates of the target star position. This approach encounters 
several difficulties: 

• Difference images can be noisy, particularly for low SNR transits. This is particularly 
a problem for transits near spacecraft thermal events and in multiple planet systems, 
where the transit signals from multiple planets can interfere with each other. 

• Determination of the location of the target star should use the same method as the 
difference image to minimize the impact of systematic measurement errors. 
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Fig. 11. — Example pixel images for KOI-221 in quarter 7, which shows no indication that 
the transit is not on the target star. In all figures, the dotted white line borders the pixels 
of the optimal aperture, while the solid white line borders all pixels collected for this target. 
Known stars are shown as white asterisks, with each star's KIC catalog number and Kepler 
magnitude. Upper Right: the averaged out-of-transit pixel image. Lower Left: the in-transit 
pixel image. Upper Left: the difference image = out-of-transit pixel image - in-transit pixel 
image. Lower right: the difference image normalized by pixel value uncertainty. In this case 
the difference image appears identical to the in- and out-of-transit images, which indicates 
that the transit source is coincident with the target star. 
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Fig. 12. — Example pixel images for KOI-109 in quarter 4, which shows indications that 
the transit is not on the target star. Upper Right: the averaged out-of-transit pixel image. 
Lower Left: the in-transit pixel image. Upper Left: the difference image = out-of-transit 
pixel image - in-transit pixel image. Lower right: the difference image normalized by pixel 
value uncertainty. In this case the difference image appears to be very different from the in- 
and out-of-transit images, which indicates that the transit source is coincident with the star 
KIC 4752452. 
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Fig. 13. — Pixel images for a low SNR transit on KOI-2949 with an SNR of 11. The difference 
image appears significantly different from the out-of-transit image in this quarter, indicating 
that the transit source is not on the target star. But other quarters show the transit source 
in other locations including on the target star. This situation is typical for low SNR transits, 
and more reliable measurement of the transit source location can be attained by combining 
the quarters as described in §3.41 In this example the combined quarter result indicates that 
the transit location is statistically consistent with the target star. 
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• The structure of the background signal for the target star due to crowding will be very 
different from the difference image background signal because non- variable background 
stars will cancel out in the difference image. 

• In different quarters stars fall in different places on different pixels and pixel apertures 
vary from quarter to quarter. Therefore the offsets measured in different quarters can 
be different. 

We address these difficulties through the following strategies: 

• Careful construction of the in- and out-of-transit images, described in §3.2[ so the 
difference image is as clean as possible. 

• Determining the location of stars in the difference or out-of-transit image via PSF-type 
fitting to the pixel data using the Kepler Pixel Response Function (PRF), described 
in §3.31 which is more robust against noise than photometric centroids. 

• Either carefully averaging the quarterly offsets ( §3.4. ip . or performing a joint multi- 
quarter fit f ^3A2|) . 



3.2. Construction of in- and out-of-transit and difference pixel images 

Our goal is to measure the location of the change in the flux due to the transit signal. 
Therefore we want to create a difference image by subtracting pixel flux in transit from pixel 
flux near transit . We want to avoid pixel flux away from the transit so changes due to stellar 
variability are less likely to enter into the difference image. We also want to avoid changes 
in flux that are not related to the transit under examination, such as spacecraft thermal or 
pointing events or transits due to other planets orbiting the target star in multiple systems. 
We minimize noise by averaging as many in- and out-of-transit measurements as possible 
subject to these constraints. 

In each quarter, Kepler collects about 4300 long cadences, from which in- and out-of- 
transit exposures need to b e identified. We use the (unwhitened) transit model M n con- 



structed in Data Validation ( jWu. et al. Il2010l ) to select these cadences. 



In-transit cadences are defined as those cadences where the model is less than a threshold 
proportional to the model transit depth. The current threshold is 3/4 of the transit depth: 
when the model is normalized so that M n = for out-of-transit cadences, in-transit cadences 
are those for which the model values M n < —jd, where d is the modeled fractional transit 
depth. 
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The out-of-transit cadences are chosen near each transit under the following criteria: 

• Out-of-transit cadences are chosen on both sides of the transit so that an average of 
these out-of-transit cadences removes any locally linear secular trends. 

• Not too many cadences are chosen so that nonlinear variability on time scales longer 
than the transit are small. 

• Out-of-transit cadences should not be too close to the transit. 

The number of out-of-transit cadences iV out is chosen as the number of cadences that occur 
during the entire transit duration where M n < 0. This is generally not the same as A^ n . The 
out-of-transit cadences are chosen to lie more than iVbuffer cadences from the cadences for 
which M n < 0. Fig. [HI shows an example of selected cadences for a typical transit. 

After in- and out-of-transit cadences are chosen they are excluded if they are associated 
with any of the following events: 

• Data gaps such as Earth points and safe modes. 

• Cadences within a day after major spacecraft thermal events, such as recovery from 
Earth points and safe modes that significantly change the temperature distribution of 
the spacecraft and require many hours to return to thermal equilibrium. 

• Pointing anomalies such as attitude tweaks, and loss of fine-point events. 

• Interference by transits from other planets in multiple planet systems. An example of 
such interference is shown in Figure [TS] 

If more than a small number of cadences associated with a transit are excluded then 
the entire transit is excluded from the construction of the difference image. This threshold 
is currently set to zero, so if any cadences are excluded then the entire transit is excluded. 
As Kepler detects longer-period transits, so fewer transits will be available, this threshold 
will be relaxed to one or two excluded cadences per transit. 

Once the final set of transits and their in- and out-of-transit cadences are identified, the 
in-transit pixel values are averaged to produce the in-transit image and the out-of-transit 
cadences are averaged to produce the out-of-transit image. The pixel values are not whitened 
or otherwise detrended: we rely on the averaging described in this section to remove local 
secular trends. First the average pixel values are computed for each transit, then each 
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Fig. 14. — An example of in- and out-of-transit cadence selection (KO 1-221). Top: the 
transit model M n for a selected cadence range in quarter 6. The x-axis shows the cadences 
since the beginning of the Kepler science operations. The circles at the bottom of the transit 
show the cadences that were chosen for the in-transit image. N- m = 4 cadences were chosen 
in the transit because they are below the threshold described in the text. The circles outside 
the transit show the cadences chosen for the out-of-transit image. The full transit is six 
cadences wide so iV ou t = 6 cadences were chosen on both sides of the transit. The out-of- 
transit cadences are A/buffer = 3 cadences from the transit. Bottom: the actual transit in one 
of the brighter pixels. The x-axis shows the cadences since the beginning of quarter 6. 
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Fig. 15. — An example of the interference with cadences chosen for a transit in the Kepler- 
11 system. Seven out-of-transit points to the right of the transit are excluded because of 
the interfering transit by the other planet candidate, which causes the entire transit to be 
excluded from the construction of the average pixel images. 



transit's averaged pixels in a quarter are averaged together to produce the final in- and out- 
of-transit average pixel images for that quarter. The difference image for the quarter is then 
the out-of-transit pixel image minus the in-transit pixel image. 



3.3. Fitting the Pixel Response Function 



In this section we describe how the Kepler pixel response function (PRF) (iBryson. et al. 



20101 ) is used to provide a robust, high-precision estimate of the target star and transit 
locations using the average out-of-transit and difference images constructed as described 
in §3.21 This technique requires that the target star is several magnitudes brighter than 
other stars in the out-of-transit pixels, and that the transit signal is sufficiently strong in 
the difference image. In §3.3.21 we describe a quantitative measure of whether the average 
images for a given target star have the required properties. §3.3.11 describes various ways in 
which this method can be compromised and discuss mitigation strategies. 

The PRF gives the long-cadence brightness of a pixel due to a star at a specified location. 
The PRF can be thought of as the convolution of the optical PSF with the effects of pointing, 
sub-pixel response and system electronics. In this section we write the PRF as a unit flux 
function / (at, 5, r i: c$) so r h Ci) = 1, where P total is the number of all pixels 
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that contain flux from a star at sky coordinates (a, 5), and r, and q are those pixels' row 
and column coordinates. If the star has flux b, then the value of a pixel at row and column 
Cj due to that star will be = 6 / (a, 5, Cj), and the s um of all pixels containing flux from 



that star is X^i tal Pn,a = ^- ( m Bryson. et al. ( 2010 ) the star location is defined in pixel 



coordinates rather than sky coordinates. In this paper we include the projection from sky 
coordinates to pixel coordinates in the PRF function /). 

Assume we are given a set of P pixel values Pi with rows q and columns q that form a 
pixel image. The P pixels need not contain all the flux from the target star, so P may be 
less than P to tai- A PRF fit to these pixels is the determination of sky coordinates (ofgt, 5& t ) 
and flux 6g t that minimize the function 

p i 

X 2 = E — (pt-VMrt,*)) 2 (14) 
where a n is the uncertainty in the pixel value p 7 > This fit is performed iteratiy ely via the non- 



linear Levenberg-Marquardt algorithm ( iLevenberg Ill944l ; iMarquardt Ill963l ). Formally this 



is a three dimensional fitting problem in the parameters a, 5 and b. The fit to b, however, can 
be reduced to a linear problem once the position is known, so this problem can be treated as 
a much faster two-dimensional non-linear fit in a and 5. In each iteration of the Levenberg- 
Marquardt algorithm the pixels pi at (r*, Cj) and the fit parameters a and 5 are provided to 
the model function. We first evaluate the uncertainty-normalized Kepler PRF at a and 5, 
computing pi = f (a, 6, r iy q) /cr, for each pixel. The flux b is the linear least-squares fit of 
the input pixel values pi to the model bpi, given by 

b=^§- (15) 

The product bpi is then returned by the model function. The Levenberg-Marquardt algorithm 
seeks the a and 5 that minimizes Ylf=i (Pi ~ bpi/ai) 2 after several iterations. (In the Kepler 
pipeline this is implemented as a model function passed to the MATLAB function nlinfit.) 
Once the iteration has converged, providing (ag t , 5fit), the final estimate of b can be computed 
as 6fi t = PiPi) / (j2i=i Pi) ; where now pi = f (a fit , 5 fit , r i5 q) / cr 4 . 

The typical implementation of the Levenberg-Marquardt algorithm returns the Jacobian 
J, which contains the derivatives of the model function with respect to position. To estimate 
the uncertainty of the fit location we need the Jacobian of the position with respect to the 
pixel values given by the model function. We obtain this by inverting J, using the pseudo- 
inverse, to give the transformation T = (J T J) J T ■ T is a P x 2 matrix, and the columns 
of T are normalized by the pixel uncertainties: Tij — > Tij/<Ji for j = 1,2. Then the PRF fit 
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location covariance matrix is C = T T C P i xe \T, where C P i xe i is the pixel covariance, and the fit 
location uncertainties are the square root of the diagonal of C: o a = y/Cx~i and as = yfC%^. 

The PRF is fit separately to the difference image and the out-of-transit image. Because 
the fit to the difference image (a^is, ^diff) measures the position of the transiting source and 
the fit to the out-of-transit image («oot, ^oot) measures the position of the target star, the 
offset of the transit source from the target is simply (Act, A5) = ((aidiff ~ «oot) cos 5oot, ^diff — <5oot)- 
Then the offset distance and uncertainty are computed as in Equation [T3J 

In- and out-of-transit pixel images, and therefore difference images, can only be con- 
structed on a quarter-by-quarter basis. Images cannot be combined across quarters in a 
useful way because 

• The same star will fall on slightly different pixel locations in each quarter due to 
pointing differences and small asymmetries in the construction of the Kepler focal 
plane. 

• The Kepler PRF at the star's location can have large changes from quarter to quarter. 

• The pixel aperture generally varies in both size and shape from quarter to quarter. 

Two approaches to combining quarters will be described in §3.41 



3.3.1. Systematic PRF fit error 

Systematic error in the PRF fit arises from primarily from two classes of sources: error 
in the PRF model being fit and crowding. These errors cause biases in the offset vector 
(Act, AS). There are various ways to control systematic PRF fit errors, so we examine these 
errors in detail. 



Sources of PRF fit error 



PRF Model Error The PRF model contains various sources of error (IBryson. et ah 



20101 ) which lead to a priori unpredictable bias in the PRF-fit centroid. Because the target 



star falls on different parts of the Kepler field of view in different quarters, variation of the 
PRF across the focal plane causes the PRF error bias to vary from quarter to quarter. 



Crowding Bias The PRF fit is a single-star fit, and therefore assumes that the target 
star in the out-of-transit image and the transit signal in the difference image are the only 
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stars present in the pixels. This is rarely the case in the out-of-transit image and sometimes 
not the case in the difference image due to variability of field stars. Unlike the case of 
photometric centroids described in §2J the effect of crowding on the PRF fit is difficult to 
predict. Because field stars mostly cancel in the difference image, the crowding signal in 
the out-of-transit and difference images can be very different. Therefore the PRF fit to the 
out-of-transit and difference images can have very different biases, which leads to errors in 
the offset vector (Aa, AS). An example of a target with a large amount of crowding is shown 
in Figure [TBI 

In the worst case there is a field star in the out-of-transit image brighter than the target 
star, so the PRF fit to the out-of-transit image returns the centroid of the field star rather 
than the target star. When this bright field star cancels in the difference image, so the 
difference image is dominated by a transit on the target star, the offset vector (Aa, AS) 
gives the distance of the transit signal from the field star rather than the target star. The 
result is an incorrect measurement of a significant offset of the transit source from the target 
star. An example of this situation, KOI-1860 (discussed in §2.2. lj) . is shown in Figure [T71 



Mitigation of the impact of PRF fit error within a quarter 

Average out-of-transit and difference images are computed for each quarter, and these are 
fit by the PRF to estimate the offset of the transit source from the target star. PRF model 
error and crowding contribute systematic errors in this estimate. Here we discuss ways to 
mitigate these systematic errors within each quarter. In §3.4.1l we discuss ways the possibility 
of averaging out these systematics across quarters. 

The Kepler PRF for nearby stars will be very nearly the same, so the PRF model error 
for those stars will be similar. Assuming low crowding, the PRF fit of the out-of-transit 
image and the fit to the difference image will have similar biases due to PRF model error. 
When forming the offset vector (Aa, AS) as the difference between these two fits, these 
biases should approximately cancel. We therefore prefer the offset vector computed as the 
difference between the two out-of-transit fits when the target star is not highly crowded. 

When the target star is highly crowded, crowding bias will dominate the out-of-transit 
PRF fit but rarely the difference image PRF fit. This bias is usually due to an error 
in the measurement of the target star position. As an alternative we compute the tran- 
sit source offset relative to the target star's catalog position. We define (Aa, A5) catalog = 
((a d iff - "catalog) cos 5 ca t a iog, 5diff - Catalog), where (a cata i g, Catalog) is the catalog position of 
the target star (usually from the Keplerinput catalog). When (Aa, AS) differs from (Aa, AS) ( 
by more than a Kepler pixel (3.98 arcseconds), the out-of-transit measurement of the target 
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Fig. 16. — An example of a target with large amounts of crowding (KOI-1861). The in- and 
out-of-transit images do not appear as a typical star, and the fact that this is due to crowding 
is indicated by the large number of asterisks on the image indicating many relatively bright 
background stars. The difference image, on the other hand, looks much more like a star 
because most of the background stars in the image have cancelled out, though there is still 
some residual background contamination. In this case the fit to the out-of-transit image will 
have a large bias relative to the target star, while the bias in the difference image fit will be 
much smaller. This results in a biased offset measurement of the transit source relative to 
the target star. Visual inspection of the difference image, however, indicates that the transit 
source is closer to the target star than the biased measurement would indicate. 
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Fig. 17. — An example of a target with bright field star that captures the out-of-transit 
PRF fit (KOI- 1860). The out-of-transit image is dominated by the bright star in the upper 
right corner, so this field star position will be returned by the PRF fit to the out-of-transit 
image. The difference image, however, shows a nicely star-shaped pattern at the location of 
the target star, so the target star position will be returned by the PRF fit to the difference 
image. The resulting offset vector measures the distance of the transit source (target star in 
this case) to the bright field star rather than the distance of the transit source to the target 
star. In this case blindly using the offset values would lead to the erroneous identification of 
a background false positive. 
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star position (aooT, <5oot) likely contains large errors and the offset vector (Act, AS) should 
be considered unreliable. The catalog-based offset error (Aa, A5) catalog can be used instead, 
but is itself subject to error because a) it does not mitigate fit error due to PRF error and b) 
is subject to catalog errors due to, for example, unknown proper motion of the target star. 
In this case the PRF fit results should be considered qualitative and to have lower accuracy 
than non-crowded targets, regardless of the formal propagated uncertainty. In the example 
in Figure [T71 the magnitude of the offset vector in that quarter is about 11 arcseconds, while 
the magnitude of the offset from the catalog position is about 0.6 arcseconds. 



A work in preparation (IBryson and Morton 1120131 ) will describe the use of modeling to 



identify and mitigate bias due to crowding. 

In the majority of cases the bias will be due to a mix of crowding and PRF model error, 
with comparably small contributions from each. In this case we reduce the overall bias by 
taking advantage of the variation in bias across quarters via averaging as described in §3.41 



3.3.2. PRF Fit Quality 

The quarterly out-of-transit and difference images can be polluted by various types of 
contamination. For example the out-of-transit image may have bright stars in addition to the 
target star. The difference image may have more than one stellar image due to the variability 
of a field star, or the transit may have low SNR, causing the difference image to be poorly 
formed as in Figure [13j These cases will degrade the reliability of the PRF-fit source offset 
measurement. The quality of the PRF fit can be determined by evaluating the PRF at the 
fit position, creating a synthetic pixel image containing only one star at that position, and 
compare this to the observed average pixel image. This synthetic image will have the pixel 
values pi = batf («fit, 5fit, q) (= be±Pi), where the subscript "fit" refers to "diff" or "OOT" 
as appropriate. These can be compared to the actual pixel values Pi to determine if the fitted 
PRF reproduces the observed pixels. One simple comparison is to compute the correlation 
between pi and Pi, and declare the fit good if this correlation is above some threshold. For 
the difference image fit quality we set the threshold to 0.7. When the correlation is below 
this threshold, then the difference image is likely dominated by noise, typically because the 
transit has a very low SNR. When the correlation is below threshold for the out-of-transit fit, 
then it is likely that there is more than one bright star in the image, which compromises the 
fit due to crowding. In both cases the source offset measurement is likely to be unreliable. 
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3.4. Combining Quarterly Results 



A comparison of PRF-fit star positions with their catalog RA and Dec show that the 
combination of crowding and PRF error bias has an approximately Gaussian distribution 
with a media n of 1 millipixel (0.00 4 arcsec) and a median absolute deviation of 22 millipixels 
(0.09 arcsec) (IBryson. et al. II2010I ). While the quarter-to-quarter variation in the PRF fit of 
a particular star can have larger spreads, we find that for most stars this quarter-to-quarter 
variation is approximately zero-mean on average. We therefore combine the quarterly offsets 
to improve the precision of the PRF-fit centroid offset vector. 



3.4-1- Multi- Quarter Averaging 

We denote the single-quarter PRF fit offset vectors by (Aa q , AS q ), where q labels 
the quarter. A simple average of Q quarters, A Ylq=i (Aa g , AS q ) with its uncertainties 

^ yJ^Li ^0"a<*,5 a AS q ^j can De use d but this has the weakness that the uncertainties do not 

reflect scatter in the quarterly averages. For example a set of points on a large circle with 
some uncertainty will have the same average and average uncertainty as a set of points with 
the same uncertainty that all lie at the center of the circle. We would like the uncertainty 
to reflect the scatter of the quarterly offsets. 

We accomplish this by treating the quarterly offset vectors and their uncertainties as a 
time series, and compute the average offset (Aa, AS) by robustly fitting this time series with 
a constant. In other words we compute a least-squares robust fit of a Oth-order polynomial 
to the quarterly data, minimizing 

J2-±-,(Aa q -A^)\ J2-^—,(AS q -A5) 2 . (16) 

g=l (0"AaJ q= i {^ASj 

We compute a robust fit to suppress statistical outliers in the belief that these are due to 
transient biases resulting from systematic events such as pointing or thermal anomalies. 
The uncertainties in the above fit are typically returned by the robust fit algorithm used 
to compute ( Aa, AS) . Care must be taken when estimating these uncertainties a priori 
from the quarterly data because every fourth quarter the spacecraft orientation is strongly 
correlated. 

The above estimate of the average uncertainty assumes Gaussian statistics. While PRF 
fit biases appear nearly Gaussian in the statistical sense, they may not be Gaussian for 
individual targets. We therefore compute an alternative uncertainty via bootstrap anal- 
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ysis, which provides a more general estimate of the uncertainty. We use a resample-with- 
replacement strategy, creating an ensemble of Q 2 simple multi-quarter averages. Specifically, 
given the set of Q measured offsets (Aoti, Aa 2 , . . . , Ao;q), Q 2 realizations are created, where 
in each realization we replace each element with an offset randomly chosen from the mea- 
sured set. Examples of these realizations when Q = 5 include (Aa 3 , A«i, A«5, Aa 4 , Aa 2 ) 
and (A«2, A«4, A«i, Aa 4 , A«i). Averages are computed for each of these realizations, and 
the standard deviation of the resulting ensemble of Q 2 averages provides the bootstrap un- 
certainty estimate. The bootstrap uncertainty is typically very similar to the uncertainty 
returned by the robust fit described above, but can be significantly different for specific tar- 
gets. We choose the larger of the two uncertainty estimates as the final uncertainty estimate 
for the multi-quarter average a^ a . A similar analysis applies to a as- 

Examples of this multi-quarter averaging technique are shown in Figures [T5] through 
[221 Figure [18] shows with no significant offset while Figure [19] shows cL CcLSG with 

a significant offset, indicating that the transit signal is on a background star. For long- 
period transiting planets, where there are few quarters that contain transits, the benefits of 
multi-quarter averaging will diminish. In such cases, however, multi-quarter averaging can 
often provide good results, an example of which is shown in Figure [201 Figure [5T] shows 
the low SNR example discussed in §3.1[ where we see that there is a large scatter in the 
quarterly measurements, but the multi-quarter average is within three standard deviations 
of the target star. 

The case of KOI-1860, where a bright field star at the edge of the captured pixels 
introduces large systematic error, is examined in Figure [22] The offset relative to the out- 
of-transit centroid is measured to be about 4 arcseconds, which is a statistically significant 
4o~. For most quarters, particularly those which would show a larger offset, the PRF fit 
to the out-of-transit image failed because the bright star falls very close to the edge of the 
captured pixels. The offset relative to the catalog position, however, is much smaller, with 
a mult-quarter average of about 0.3 arcseconds or la. Because we are aware of the bright 
star crowding for KOI-1860, we defer to the offset relative to the catalog position, which is 
not statistically significant. 

We demonstrate the increased precision of the multi-quarter average in Figure [231 The 
offset distance from the target catalog position is shown for both individual quarter PRF fits 
and their quarterly average. This analysis uses 2,278 KOIs whose quarterly averaged offsets 
are less than 3<r and whose offsets from the target are < 5 arcseconds in the Q1-Q12 data. 
The left panel shows the 21,401 individual quarter offsets, while the right panel shows the 
offset of the average over all quarters for each target. The individual quarter offsets have 
a standard deviation of 0.90 arcseconds, while the multi-quarter averages over 12 quarters 
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Fig. 18. — An example of multi-quarter offset analysis when the transit signal seems to be 
on the target star (KOI-221). In both figures the x- and ?/-axes give the offsets Aa and 
AS, with (0, 0) being the catalog location of the target star. The green crosses show the 
individual quarter offsets labeled by quarter, and the length of the crosses are equal to the 
uncertainties a& a and a^s- The location of the multi-quarter average (Aa, AS) is shown as 
a magenta cross (obscured by the tight cluster of green crosses). The blue circle has radius 
equal to three times the uncertainty in the magnitude of (Aa, AS). Star locations relative 
to the target star are shown as asterisks, with the target star in red (there happen to be 
no other stars in this figure). The KIC catalog number and Kepler magnitudes are shown 
next to each star. We see that most offsets are tightly clustered within 0.1 arcseconds of the 
target star with Ql and Q2 as outliers. Left: the offsets (Aa, AS) relative to the PRF fit to 
the out-of-transit centroid. Right: the offsets (Aa, A<5) catalog relative to the catalog position 
of the target star. The difference between the left and right plots is not a simple translation 
because the two plots have different biases due to PRF error and crowding (see §3.3. 1[) . 
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Fig. 19. — An example of multi-quarter offset analysis when the transit signal seems to be 
on a different star than the target star (KOI-109). The quarterly offsets are tightly clustered 
around the star KIC 4752452, indicating that this star is the source of the transit. See the 
caption to Figure [18] for a description of these plots. 
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Fig. 20. — An example of multi-quarter offset analysis for a confirmed planet signal (Kepler- 
22b) with a very long period orbit, so only four quarters show transits. The result is a 
larger scatter and higher average uncertainty compared to the case where there are transits 
present in every quarter. Also there is a significant difference in the offsets relative to the 
out-of-transit centroid in the left panel and relative to the target star's catalog position in 
the right panel. This is likely due to a combination of not-fully- averaged PRF bias and 
catalog error. If this planet were not confirmed by other methods (IBorucki. et al.ll2012l ) we 
would have only moderate confidence that the transit signal is on the target star. See the 
caption to Figure [18] for a description of these plots. 
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Fig. 21. — An example of multi-quarter offset analysis for a low SNR transit signal (KOI- 
2949) with SNR =11. In this case the quarterly offsets have a large scatter measured in 
arcseconds, but the average across quarters is within 3 standard deviations of the target star. 
See the caption to Figure [[8] for a description of these plots. 
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Fig. 22. — An example of multi-quarter offset analysis for a target star (KOI- 1860, also 
discussed in §2.2.1j) whose pixels contain a brighter field star (see Figure fTTll . The offsets 
relative to the out-of-transit centriod are large because the bright star captured the out-of- 
transit PRF fit. The out-of-transit PRF fit also failed in many quarters because the bright 
star is at the edge of the pixel aperture. The offsets relative to the target star's catalog 
position are, however, well clustered around the target star indicating that the offset of the 
transit is not statistically significant. We therefore conclude that the large offset relative to 
the out-of-transit centroid is due to systematic effects from the bright field star in the pixels. 
See the caption to Figure [18] for a description of these plots. 
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have a standard deviation of 0.41 arcseconds. Strong year-to-year correlations prevent the 
standard deviation from scaling as 1/ y/Q, but do not prevent an improvement as Q increases. 



Figure shows how the standard deviation depends on the number of quarters av- 
eraged. We see that adding a quarter always statistically increases the precision of the 
multi-quarter average, though this may not be the case for every individual target. 
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mean offsets Q1-Q12 
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Fig. 23. — Distributions of the PRF-fit offset from the target catalog position for 2,278 KOIs 
whose quarterly averaged offsets are less than 3a and whose offsets from the target are < 5 
arcseconds. Left: the distribution of individual quarter offsets. Right: the distribution of 
the multi-quarter averages. 



34.2. Joint Multi-Quarter PRF Fit 

When the transit SNR is very low, there may not be enough signal in each quarterly 
difference image to support per-quarter PRF fitting. In this case we perform a joint multi- 
quarter fit, where the pixel images for all quarters are supplied to the PRF fitter, and the 
single RA and Dec (and quarter-specific PRF amplitude) is found that minimizes the pixel- 
level difference between the pixel images and PRF-reconstructed pixels over all quarters. In 
other words, the joint multi-quarter fit finds the single sky position (a, S) that minimizes the 
function 

Q p 1 

q=l i=l Pi,1 
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Fig. 24. — The standard deviation of the multi-quarter average as a function of the number 
of quarters used in the average. The x-axis shows quarters used, where for each point the 
average is taken for the transits found in quarters 1 through the x-axis value. 
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where the subscript q means the quarter-specific values of each quantity. So in each quarter 
the flux-normalized PRF b q f q for that quarter is evaluated at (a, 5) (which is common to all 
quarters) for that quarter's pixels (r iq ,c itq ). These PRF-based pixel values are subtracted 
from the observed pixel values Pi jq for each quarter. The square of this difference normalized 
by the uncertainty is summed over all the pixels in that quarter, and finally summed over 
all quarters producing the test x 2 value. The sky position is varied until the (a, 5) that 
minimize x 2 is found. The details of the computation in each quarter are similar to the 
single-quarter fit in §3.31 

The propagated uncertainty in this fit does not account for scatter across quarters due 
to systematic error, so it dramatically underestimates the actual uncertainty in this fit. 
We compute a more accurate uncertainty via a bootstrap approach much like that for the 
multi-quarter averages described in §3.4.14 except the data consist of pixel images rather 
than offsets and each element of the ensemble is a joint PRF fit. Specifically, the multi- 
quarter PRF fit takes as input the set of pixel images (Ji, J 2 , . . . ,Iq) constructed in §3.21 
where I q is the pixel image for each quarter. The bootstrap approach creates an ensemble 
of resamplings-with-replacement sets of pixel images, for example (J 4 , J 5 , J 3 , J 2 , 12) if Q = 5. 
The multi-quarter fit is performed on each element of the ensemble, computing a best fit 
(a, 5) for each one. Each element of the ensemble is fit with the parameters from the quarter 
for that component. For example if the first element of the ensemble is I4, then the PRF 
from quarter 4 is applied to those quarter 4 pixels. The uncertainty in the joint multi-quarter 
fit is then set to the standard deviation of the ensemble of fit positions. 

The size of the resampled ensemble needs to be chosen with care. The time to compute 
the joint multi-quarter fit scales with the number of quarters Q. If the usual choice of Q 2 were 
chosen for the size of this ensemble, the full computation of the joint fit and its uncertainties 
would scale as Q 3 . In the Kepler pipeline, a bootstrap joint fit of 8 quarters took about 20 
minutes, which indicates that a 16-quarter fit would take almost three hours. It is prohibitive 
to run this on all 15,000 to 20,000 threshold crossing events identified by the pipeline. The 
joint PRF fit is therefore not routinely run in the Kepler pipeline, but is reserved for low 
SNR transits for which the multi-quarter average does not provide a sufficiently precise 
result. The possible use of a smaller resampled ensemble is under investigation. 



4. Pixel Correlation Images 

An alternative method for determining the location of the transit signal in the pixels is 
to fit the transit model to the individual pixel flux time series. This uses the same fitting 
method described in §2.21 with the centroid time series replaced by the pixel flux time series. 
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In this case the fit constant 7 is a measure of the presence of the transit signal in each 
individual pixel. An example of these fits is shown in Figure [25j A pixel correlation image 
can be constructed by setting the value of each pixel to its model fit value 7. When this 
is done for the example in Figure [251 we get the pixel image in the left panel of Figure [2j3 
The right panel of Figure [26] shows an example where the transit signal is offset from the 
target star. For such high SNR targets, the transit signal is readily apparent in the pixels, 
and the correlation image has a star-like appearance. In these cases the photometric or PRF 
centroiding can be applied to quantitatively and automatically compute the location of the 
transit, which can be compared to the catalog position of the target star or the target star 
location from the PRF fit to the difference image. 

When the transit has low SNR or the pixels have significant flux from other sources, 
the pixel correlation image can be of much lower quality. Two examples of this situation are 
shown in Figure [271 

Because the correlation image is degraded by background flux and can have poor be- 
havior at low SNR, it is not generally used for false positive identification. There are circum- 
stances, however, where the correlation image can be used in combination with the other 
methods to make a determination. For example, some low SNR targets have marginal dif- 
ference and correlation images, but if they show the transit signal in the same pixel location 
then we have increased confidence that the transit signal in those pixels is real. 



5. Saturated Targets 

Target stars with Kepler magnitudes brighter than ~ 11.5 can exhibit saturation, where 
the flux in a pixel ex ceeds that pixel's full well and spills up and down the pixel columns 



( [Caldwell, et al.ll2010l ). The result is that the pixel image of the star can be highly distorted, 
invalidating all of the centroid methods described in this paper. Saturation can be highly 
asymmetric, so even photometric centroids are of limited use. Visual inspection of the 
difference image can, however, reveal large, multi-pixel offsets indicating that the transit is 
not on the saturating star. 

When the saturated star is the transit source, the difference image will have a distinctive, 
non-star-like, pattern. Because the saturation spills along columns and the amount of spill 
is approximately proportional to the flux of the star, a transit signal on a saturated star will 
appear in the difference image as changes at the ends of the saturated columns. An example 
is shown in Figure [28j This is a characteristic pattern in the difference images for saturated 
targets. All that can be said in this case is that the transiting source is in approximately the 
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Fig. 25. — Fits of the transit model to individual pixel flux time series for KOI-221 in quarter 
7. The pixel flux time series is shown in blue and transit model is in red. Each pixel flux 
time series is detrended and folded on the transit period. A closeup of the transit event is 
shown, with the same time interval on all x axes. The y-axes show the pixel values and are 
scaled to show the variation in each pixel time series. The pixel rows are shown along the 
left, and pixel columns along the bottom. The pixels that strongly contain the transit signal 
indicate the location of the transit source. 
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Fig. 26. — Correlation images, created by assigning each pixel the scale factor that multiplies 
the transit model to best fit that pixel's flux time series. Left: the example from Figure [25 
of the transit signal being coincident with the target star (KOI-221). Right: an example 
with the transit signal significantly offset from the target star (KOI-109). In these figures 
the small white squares indicate pixels for which the fit scaling is above a threshold. 
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Fig. 27. — Correlation images for more problematic transits. Left: an example where there 
is a field star in the aperture brighter than the target star (KOI-1860). Variability of the 
bright star pollutes the correlation image, but the transit signal is still apparent. Right: a 
low SNR example (KOI-2949) with SNR = 11. For such low SNR transits, the transit signal 
is barely discernable in the individual pixel time series, which causes the correlation image 
to be dominated by background variability and pixel-level systematics. 
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same column position as the target star, between the ends of the saturation. If the transit 
were due to a field star that is not in the saturated pixels, the difference image would show 
that star and not the signal from the saturated pixels. 

Special investigation of saturated targets can sometimes refine the location of the transit 
signal. The appearance of the transit at the end of the saturated columns is sensitive to the 
column position of the transiting source. If the transit SNR is high enough, the wings of 
the transits can be subject to a PRF fit while masking out the saturated columns. These 
techniques have been applied with some s uccess, identifying t he location of the transit signal 
to within 4 arcseconds, for Kepler-21b (jHowell. et al.l 120121 ) . We refer the reader to that 
publication for details. 



6. Performance and Comparison of Techniques 

In this section we examine the performance of our transit-source location estimation 
via photometric and PRF-fit centroids. We focus on offset distances because that is the 
high-level metric used in initial false positive identification. We examine three populations 
of targets: 

• all Kepler objects of interest (KOIs) d immer than Kepler magnitude 11.5 (to avoid 



saturated targets (ICaldwell. et al.ll2010l )). which have well-defined transit-like signals 
of sufficient quality to pass vetting and produce an ephemeris and valid PRF fits (4,049 
KOIs). Many of these KOIs are in multiple systems. 

• unsaturated KOIs that have been identified as being due to transit sources that are 
unlikely to be on the target, called Active Pixel Offsets (APOs), that have valid PRF 
fits as of July 2012 (178 KOIs). 

• a small number of APO KOIs whose transit signals have been identified with stars in 
the Kepler input catalog (16 KOIs). 

In this section we focus on the following questions: 

• How well do the methods identify the location of these sources? 

• Is there evidence that the source locations correspond to a uniform distribution of 
background sources? 

• How do these methods compare with one another with respect to accuracy and preci- 
sion? 
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Fig. 28. — An example of a t ransit signal on a saturated star for the confirmed planet 
Kepler-21b (jHowell. et al.ll2012l ). The host star has Kepler magnitude = 8.4 and is highly 
saturated. In the difference image the transit is apparent in the pixels at the end of the 
saturation in columns 612 and 613 (the star labels have been removed from the difference 
image for clarity). The target star is near the boundary between these two columns, which 
is why there is about equal saturation in both columns. Note the strong asymmetry in the 
saturation for this quarter, with the saturation going up the columns significantly further 
than down. 
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We also address an issue that arises with high-transit-SNR targets, where offsets can be 
very small but the formal uncertainty can be much smaller. In this situation we encounter 
residual bias that is not accounted for in the uncertainty, which causes offsets to incorrectly 
seem statistically significant. 

6.1. Accuracy 

We use APO targets whose transit signals have been associated with known stars to 
measure how accurately our two primary methods of photometric and PRF-fit centroids 
identify the source location. This association is determined by manual investigation of the 
difference images independently of the offset computations. We see in Figure [29] that the PRF 
estimate of the transit source offset is close to the star identified as the transit signal source. 
For APOs with small offsets (< 4 arcseconds) the photometric centroids also have good 
accuracy. For APOs with larger offsets, however, photometric centroids show large errors. 
This behavior is expected because the Kepler pipeline uses one set of pixels to estimate the 
depth of the transit signal and a larger set of pixels to compute the photometric centroid. As 
described in §2.3.11 when the transit source has significant flux that falls outside the pixels 
used for the depth estimate, which is the case when the source is more than 4 arcseconds 
from the target star, there can be significant error in the transit source location inferred 
from the photometric centroids. 

Figure [30] compares the PRF-fit and photometric centroid source offset estimates for 
all KOIs, and shows that the photometric centroid estimate of the source offset is generally 
(but not always) larger than the PRF-fit estimate when the PRF-fit source location source 
is more than a few arcsec from the target. 

Figure [31] compares the PRF-fit source offset relative to the target star catalog position 
with the PRF-fit source offset relative to the out-of-transit PRF-fit centroid. These two 
offsets are similar for the majority of stars, with outliers that are likely due to bias due to 
crowding. 

Figure 132] compares the distribution of the APO KOIs and the distribution of observed 
pixel area relative to target stars. The fact that these two distributions have similar shapes 
with similar peaks is consistent with the identified APOs representing a uniform background 
of eclipsing binaries and possibly large planetary transits. This consistency contributes to 
our confidence that the APOs are correctly identifying astrophysical false positives. 
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Fig. 29. — Left: The distance of the PRF-fit and photometric centroids from known stars 
that are likely to be the source of confirmed APO transit signals (y-axis) vs. the distance 
of the known star from the target star (x-axis). Right: the same stars, showing the offset 
of the centroid from the target star (y-axis). The PRF offsets are relative to the target star 
catalog location for consistency with the photometric offsets. 



6.2. Precision vs. SNR 

The precision of a centroid measurement is dependent on the strength of the transit 
signal in each pixel. This strength depends on the transit depth, host star brightness and 
number of transits among other factors. All of these factors contribute to the transit SNR, 
so we analyze precision as a function of transit SNR. Figure [33] shows the dependence of 
formal centroid source offset uncertainty on transit SNR. Both the PRF-fit and photometric 
centroid methods show similar dependencies, though the uncertainties for the PRF-fit cen- 
troid method is somewhat smaller. A linear fit to the log-log data gives the uncertainty of 
the two methods as 

13.6 ±0.16 3.39 ±0.10 

^photometric — /g^m 1.05±0.00 ' ^PRF-fit — 7^^Ta89±fIt3l ' U°J 

These fits, along with the range of values implied by the 1-cr uncertainties in the fit pa- 
rameters, are shown in Figure [3H The uncertainty of the photometric centroid method is 
inversely proportional to the SNR, as expected, while the PRF-fit method has a somewhat 
smaller dependence on inverse SNR. The coefficient of these uncertainties (13.6 for photo- 
metric uncertainties and 3.39 for the PRF fit) is larger than the full-width-half-max expected 
for centroid uncertainties because these uncertainties include contributions from the offset 
computation. The uncertainties reported in this section are propagated formal uncertain- 
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Fig. 30. — Left: A comparison between the PRF-fit offsets (x-axis) and the photometric 
centroid source offsets (y-axis) from the target star catalog position. Right: The ratio PRF- 
fit offsets/photometric centroid source offsets (y-axis) vs. magnitude of the PRF-fit offsets 
(x-axis). APO KOIs are marked by circles. The red line in both figures indicates equality 
between the PRF-fit and photometric offsets. We see that the photometric centroid estimate 
of the source distance agrees with the PRF estimate for distances of a few arcsec from the 
target star. As expected, the photometric centroid usually overestimates the offset for transit 
sources that are further from the target star (see §2.3. f p . 
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Fig. 31. — A comparison of the PRF- fit source offset relative to the PRF fit to the out- 
of-transit pixel image (x-axis) and the PRF-fit source offset relative to the catalog position 
of the target star. APO KOIs are marked by circles. We see that most targets with large 
offsets cluster along the diagonal indicating that the two offsets are generally in reasonable 
agreement. Outliers are likely due to crowding issues. 
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Fig. 32. — Left: the distribution of PRF-fit source offsets for targets identified as APOs. 
There is a strong peak at about 6-7 arcseconds. This distribution is strongly dependent 
on the pixel aperture associated with each target star, which limits the offset that can be 
detected. Right: the distribution of pixel function of distance from the target star 

associated with each pixel, across the Kepler field of view. This distribution also a peak at 
about 7 arcseconds. The similarity between these two distributions is consistent with the 
identified APOs representing a uniform distribution of background sources such as eclipsing 
binaries and large transiting planets. 
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ties, however, which are only valid if all noise sources are zero-mean Gaussian white noise. 
As described in this paper there are several sources of systematic error that impact transit 
source offset estimation. These systematic errors are not reflected in the formal uncertainty. 




Fig. 33. — Formal offset uncertainty vs. transit SNR for PRF fit (left) and photometric 
(right) centroids using 12 quarters of data. The red dashed line in both figures shows the 
1/SNR dependency for comparison. We see that the precision of the PRF- fit offsets is 
somewhat better on average than the PRF fit offsets. This precision does not account for 
bias due to systematic error for either type of centroid. 



Because the dependence of the PRF-fit and photometric centroid estimates of the source 
offset on SNR have similar log slopes we expect that if one technique indicates a significant 
offset then the other technique will as well. This is shown in Figure [351 which indicates 
that for most targets the photometric centroid and PRF-fit methods are in agreement as 
to whether there is a significant offset for a particular target. But there are many targets, 
including a few identified APOs, that have photometric centroid source offsets < 3cr but 
PRF-fit source offsets > 3a and vice versa. 

Quantitatively, for 54.9% of all KOIs the two techniques are in agreement that the 
source offset is < 3a; 24.7% of all KOIs have agreement that the source offset is > 3a; 13.9% 
of all KOIs have offsets > 3a according to the PRF-fit technique but < 3a according to 
photometric centroids; and 6.45% of all KOIs have offsets < 3a according to the PRF-fit 
technique but > 3a according to photometric centroids. Therefore the two methods are in 
agreement on significance for about 80% of the targets. Most of the targets for which the 
PRF-fit techniques indicate an offset > 3a but the photometric centroids have a shift < 3a 
have very small PRF-fit offsets, so they are at distances where residual bias dominates as 
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Fig. 34. — Uncertainty vs. SNR from the fits in Figure 1331 plotted on linear scales. The dotted 
lines indicate the range of variation due to the 1-a uncertainties in the fit parameters. 



discussed in §6.31 



6.3. Residual Bias and High SNR Transits 

As described in §3.3.11 the computation of the PRF-fit source offset is subject to various 
kinds of bias due to PRF error and crowding. When the transit SNR is high, both centroid 
methods will have very high formal precision with very small uncertainties. The PRF- 
fit source offset estimate essentially hits a noise floor, where the offsets are dominated by 
residual biases. Figure [3S1 shows that this noise floor begins to be apparent at source offsets 
of about 2 arcseconds, where there is a noticable increase in objects with offsets between 3 
and 4<T. Below about 0.2 arcseconds there is a large excess of objects with large offsets in 
units of a. The right panel of Figure [36] shows targets with high SNR. In this population 
offsets are mostly very small, and we find most of the large excess of high-a offsets. We 
interpret this to mean that residual biases in the PRF-fit source offset are dominant under 
0.2 arcseconds. 

Figure [37] shows a similar analysis for photometric-centroid-based source offsets. The 
excess of significantly offset targets is apparent but less severe in this case. 

We mitigate the impact of residual bias on small offset / high SNR targets in PRF-fit 
estimates of the source offset in two ways: 

• Adding a small constant "noise floor" to reflect the residual bias. Because bias seems 
to dominate at less than 0.2 arcseconds, we want to avoid classifying any target with 
a source offset less than 0.2 arcseconds as an APO false positive. Because this clas- 
sification is based on a 3a threshold we add o"o = 0.2/3 arcseconds in quadrature to 
the formal uncertainty in each component: a^ a \j o"aq + °o> °"a<5 — > a/ct^ + ctq. 
(This has the same effect on the offset distance uncertainty <td as adding oo to in 
quadrature). The impact of adding this noise floor is shown in Figure [38] 

• Special treatment is given to vetting targets with small source offsets. An example 
simple set of rules for manual vetting for false positives is the following: 

— pass all targets with offsets < 0.2 arcseconds (this happens automatically when 
using the above noise floor) 

— for targets with offsets < 1 arcsecond, manually investigate those targets with 
offsets > 3 a 

— for targets with offsets between 1 and 2 arcseconds, manually investigate those 
targets with offsets between 3 and 4a 



10" 1 10° 10 1 10 2 

PRF offset (sigma) 



Fig. 35. — A comparison of the PRF-fit source offset relative to the catalog position of 
the target star (x-axis) and the photometric centriod source offset (y-axis), both in units 
of a. The vertical and horizontal lines mark where the offset = 3a, above which the offset 
is considered statistically significant. APO KOIs are marked by circles. We see that most 
targets have both offsets below 3a, but there are a significant number of targets for which 
the photometric centroid source offset is less than 3a but the PRF-fit offset is > 3a and vice 
versa. 
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Fig. 36. — The relationship between the PRF- fit source offset (x-axis) and source offset in 
units of sigma (y-axis). Left: all KOIs. Right: KOIs with transit SNR > 100. On the left we 
see that for offsets < 3 arcseconds there seem to be an excess of targets with offset > 3a (red 
line). On the right we see that for high SNR targets the offset is small, but there is an excess 
of targets with offset > 3a. This is likely due to residual bias from the errors discussed in 
§3XU 
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Fig. 37. — The relationship between the photometric centroid source offset (x-axis) and 
source offset in units of sigma (y-axis). Left: all KOIs. Right: KOIs with transit SNR 

> 100. Many KOIs fall outside the plot, but our interest is in small offset behavior. On the 
left we see that for offset < 0.2 arcseconds there seem to be an excess of targets with offset 

> 3cr (red line). On the right we see that for high SNR targets the offset is small, but there 
is an excess of targets with offset > 3o\ 
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— for targets with offsets between 1 and 2 arcseconds, declare as APO targets with 
offsets above 4a 

— for targets with offsets > 2 arcseconds, declare as APO targets with offsets above 
3a 



all KOIs 
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Fig. 38. — The effect of adding a small constant to the PRF-fit source offset uncertainty 
on the relationship between the PRF-fit source offset (x-axis) and source offset in units of 
sigma (?/-axis). Left: all KOIs from Figure[36l Right: the same targets with a constant 0.2/3 
arcseconds added to the formal uncertainty in quadrature. The excess of targets exceeding 
3a at offset < 0.2 sigma has been removed. 



7. Conclusions 

Many background astrophysical false positives can be identified through centroid analy- 
sis of Kepler pixel data. The high photometric precision of the Kepler data provides opportu- 
nities to identify such objects close to the target star, but great care must be taken to account 
for various systematic baises. We have presented three different techniques, two of which 
were analyzed in detail. This ensemble provides a power arsenal of tools for dispositioning 
nearly all KOIs. 

The PRF fit technique provides the best accuracy in the localization of transit sources 
that are not on the target star. The photometric centroid technique behaves best when 
the target star is isolated and the transit source is close to (or is) the target star. The 
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photometric centroid technique is therefore useful for confirming that the transit is on the 
target star when this is also indicated by the PRF fit technique. The photometric centroid 
technique can indicate when the transit source is separated from the target star, but when 
the separation is more than a few arcseconds the source location determined by photometric 
centroids is unreliable. 

When the SNR is low or there is significant crowding, the PRF technique can break 
down. In this case the photometric technique may provide the best evidence that the centroid 
is on the target star. The pixel correlation images can also be useful in this circumstance, 
though the pixel correlation technique is fragile. 

We find that we often use all three techniques when investigating a difficult target. This 
toolbox of techniques is a critical component of the Kepler planet candidate vetting process 
and makes a significant contribution to the reliability of the Kepler planet candidate list. 
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Facilities: The Kepler Mission 

A. Derivation of the formula relating centroid shifts to transit source location 

Assume that we are observing a target star with flux b at (cxq, So), with N nearby stars 
at RA and Dec (aij,6j), j = 1,...N, and flux b r Assume the star k, with k 7^ 0, is a 
background eclipsing binary with fractional eclipse depth <iback ( so the flux of star k in mid 
eclipse is (1 — c4, ac k) We model the PSF of the star with a function / (a, 5) that has the 
following properties, where the integral is taken over the domain where / > 0: 
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• / (a, 8) has finite support (/ = outside of a finite area). 

• J* / (a, 5) dad5 — 1. In other words / has unit flux so fy/ has the total flux 



y bjf (a, 5) da d5 = bj. 



f af (a — etj,5 — Sj) da d5 = aj and f 5f (a — etj, 5 — Sj) da d5 = Sj so, for example, 

J abjf (a — aj , S — Sj) da d5 ajbj 



= a,, 



J bjf (a — aj, S — Sj) da dS bj 
so the centroid of an isolated star is the same as that star's position. 

We now consider an aperture on the sky that may not completely capture all flux from 
stars in the aperture, and may contain flux from stars outside the aperture. Therefore 
Jap b jf ( a > 5 ) da d $ ^ bj, / ap af (a -a k ,5 - S k ) da d5 ^ a k and / ap 5f (a -a k ,5 - 5 k ) da d5 ^ 
5 k , where f denotes an integral over the aperture. We model the background flux as an 
arbitrary function B (a, S). We denote the total flux in the aperture by 



Ap \ j=1 ' J 



To simplify the following discussion, we define the notation 

If := f f( y a-a j ,5-5 j )dad5, fi ap := / B(a,S)dadS, 

J ap J ap 

jap,a ._ / a f — — §^) dadS, jf' 5 := / 5f (a — aj,5 — 5j) dadS, 

J ap J ap 

B ap ' a := I aB(a,S)dadS, B ap ' 5 := I SB (a, 5) da dS 

J ap J ap 



So bjlj is the flux from star j in the aperture, B ap is the background flux in the aperture, 
and the superscript a or S indicates the first moment in RA or Dec of these quantities. Then 



ap 



F^ = ^ =1 b,i; p + B 

The out-of-transit centroid (including all flux in the aperture) is given by 

sr^N i jap,a nap,a spN , jap,& Rap ,5 

^out = 2^=1 ^ D ^ out = 2^j=\ U 3 L j ^ D 

a ' ' pap ' $ ' ' "" pap 
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The in-transit centroid is given by 

Vf l . ; ,,/^/r" + + (1 - 4ac k ) hlt^ 



a 



' ~ Z? =1 ,^i; p + B^ + (i-d h ^)b k i* p 

'out^ap _ dback6fc /aF 



111 



^ ap - rfback^^ 
£<out j^ap _ d hack b k § k 



s ~ F a P-d 



back"fc 



The observed depth is defined so that the observed flux in mid eclipse is (1 — rf obs ) F^. 
Assuming that the eclipse is the only cause of a change in flux, the observed flux in mid 
eclipse is also given by F ap - 4ack^ P - Therefore (1 - d ohs ) F ap = F ap - d ha ^ k b k I k p , so 



J a backfofc-f fc * 

Wobs pap 



The centroid shift is given by 



An 

^■^a ^,i n ^out 

r — ( - / ^, — ^ 

COSO 



CTF - d hack b k I^ a - CT F ap + C° ut 4ackfe fc / fc ap 

i^ aP - 4ack& fc /r 

7 t rap, a ^-Yout rap 
ObackOfc h - <^ a i k 



^ ap 1 - d ohe 

7 rap, a _ ^-yout 7-a 

«obs Jfc W J fc 



1 — 4bs I k P 
, / rap,a 

jap 




We define 



AC S = -_^L_ | 1|_ _ c-out 



rap, a 7~ap,<5 
^-yap,a _ I k /-iap,5 _ -'fc 

jap j A: jap > 



which are the RA and Dec of the centroid of the flux of the transit source k in the aperture 
when all other flux is absent (alternatively this is the centroid of the difference image formed 
by subtracting in-transit pixels from out-of-transit pixels when all other flux is constant). 
Therefore this centroid is given by 

c ap,a ;= C out _ ( 1 _ A ^,8 ;= ^out _ ( J_ _ A AC (M) 

Wobs / cos<T \d ohs J 
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^jap,a^ (7^P' 5 j approximate the transit source location (a fc , 8k), with the error in this approx- 
imation decreasing as more flux from the transit source is captured in the aperture. When 
all flux from the transit source is captured in the aperture, ^C^ p,a , = (ajt,^). 

In th e Kepler pipeline imp lementation, the transit depth is estimated using the optimal 
aperture (IBryson. et al. 1 120101 ) while the centroids are measured using the optimal aperture 
plus a one-pixel ring around the optimal aperture. This is because some optimal apertures 
consist of only a single pixel, which cannot be usefully centroided. This use of one aperture for 
centroid computation and a smaller aperture to estimate observed transit depth invalidates 
the conclusion of the above analysis because d Q b s in Equation flAlj) is different from the depth 
^obs Ap determined using the optimal aperture. 

We can estimate the difference in these observed depths and predict the impact on 
the estimated transit source position. For the aperture used for centroiding, we have the 
relation dobsF^ = d^c^bkl^ , while for the optimal aperture we have the same relation: 
^optAp^optAp _ d hSLCk b k I°. ptAp . Solving both relations for d h£kCk b k and equating, we find 

A pap joptAp poptAp pap roptAp 

"obs J _ "obs r joptAp _ , r 1 k /a 9 \ 

jap joptAp ^ "obs " obs RoptAp J a P " 

k k k 

Because the optimal aperture is contained within the aperture used for centroiding, F ap / F optAp > 
1 while I k ptAp /I k p < 1- I n the typical case where the background star is much dimmer than 
the target star, F ap / F optAp will be not much greater than 1, while I k ptAp /I k p can be very 
close to zero, for example when the core of star k is in the pixel ring and only its wings 
are in the optimal aperture. Therefore <i° P s Ap can be much smaller than d \, s , resulting in 
a significant overshoot of star fc's position in Equation f lAlj) . This overshoot is particularly 
likely to happen when star k is outside the optimal aperture, in other words for background 
stars further from the target star. When star k is brighter than stars in the optimal aperture, 
including the target star, the overshoot is reduced because the the flux in the aperture is 
dominated by the flux from star k. When star k is in the optimal aperture, the impact on 
Equation (lAlj) is much less dramatic and it can provide a very good estimate of the transiting 
star's position. 
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